Introduction

Achieving cholera elimination requires adequate, representative data to inform intervention policies. In 2014, IEDCR and icddr,b established a national cholera surveillance system in Bangladesh. However, we do not know whether high-risk cholera areas are captured by this system.

In 1988, the US Centers for Disease Control and Prevention (CDC) published Guidelines for Evaluating Public Health Surveillance Systems (updated in 2001), which aimed to efficiently and effectively standardize evaluations of public-health surveillance systems using a series of broad characteristics, including representativeness and sensitivity. Representativeness is defined as the accurate description of cases over time and their distribution in a population by place and person and sensitivity is defined as the proportion of true cases or outbreaks detected by the surveillance system. Yet, to measure both indicators and validate the data collected by the surveillance system, external data are required to compare and determine the true incidence of disease in the population. Typically, such data includes medical records and registries, which rarely exist or are incomplete in low-resource settings.

Given recent estimates of V. cholerae seroincidence from a nationally-representative cross-sectional serosurvey conducted in 2015, we sought to describe the representativeness and sensitivity of the cholera surveillance system using geographically-resolved infection data. We identify how well the Bangladesh national cholera surveillance system captures

  1. the Bangladeshi population
  2. the Bangladeshi population living in high, medium, and low risk cholera areas

to determine which surveillance sites may be most efficiently used to deliver new interventions.

Hospitals in the national cholera surveillance system and the icddr,b Dhaka hospital are the only healthcare facilities that regularly perform laboratory confirmation of V. cholerae in Bangladesh. Consequently, it is important to understand how well areas with high cholera risk are captured by the surveillance system.

Methods

Identifying the cholera surveillance zone

There are 23 hospital sites that perform laboratory confirmation of V. cholerae in Bangladesh (Table 1).

Table 1: Sentinel hospital IDs and locations.

ID Hospital Division Type
1 District Hospital Norshingdi Dhaka district
2 Adhunik Sadar Hospital Habiganj Sylhet district
3 District Sadar Hospital Cox’s Bazar Chittagong district
4 Adhunik Sadar Hospital Naogaon Rajshahi district
5 General Hospital Patuakhali Barisal tertiary
6 Adhunik Sadar Hospital Thakurgaon Rangpur district
7 District Sadar Hospital Satkhira Khulna district
8 Dhaka Medical College Dhaka Dhaka tertiary
9 Uttara Adhunik Medical College Hospital Dhaka tertiary
10 Bangladesh Institute of Tropical and Infectious Diseases Chittagong Chittagong tertiary
11 General Hospital Tangail Dhaka district
12 General Hospital Narayanganj Dhaka district
13 Sadar Hospital Chuadanga Khulna district
14 General Hospital Meherpur Khulna district
15 General Hospital Comilla Chittagong district
16 Upazila Health Complex Chaugachha Jesssore Khulna subdistrict
17 General Hospital Kusthia Khulna district
18 Upazila Health Complex Madan Mymensingh subdistrict
19 Upazila Health Complex Chhatak Sunamganj Sylhet subdistrict
20 Upazila Health Complex Mathbariya Barisal subdistrict
21 Upazila Health Complex Bakerganj Barisal subdistrict
22 Health Complex Shibganj Rajshahi subdistrict
23 icddr,b Cholera Hospital Dhaka icddrb

In the absence of better data on health care utilization of the hospital sentinel surveillance sites, we assumed that the catchment areas of subdistrict, district, tertiary care, and the icddr,b Dhaka hospitals could be defined by a radii of 10-20-30-30km around each hospital (Figure 1). We refer to the joint set of buffers around all 23 hospitals as the “cholera surveillance zone” in Bangladesh.

Figure 1: Map of 10-20-30-30km buffers around subdistrict-district-tertiary-icddr,b sentinel hospital sites, respectively.

Identifying greyspots in the distribution of V. cholerae

We use modeled V. cholerae estimates of seroincidence which estimate the risk of infection within the previous year relative to the population-weighted mean across a 5 km x 5 km grid of Bangladesh. These estimates were based on a nationally-representative serosurvey of Bangladesh conducted in 70 communities in 2015. We measure uncertainty by entropy which is a measure of how uncertain we are that a seroincidence hotspot (RR > 2) is a true hotspot and that a coldspot (RR < 2) is truly not a hotspot (Shannon entropy adapted by https://www.medrxiv.org/content/10.1101/2020.01.10.20016964v1.full.pdf). Our sensitivity analysis shows… [Reason why we chose RR of 2 as the threshold for entropy]

Higher entropy corresponds to high uncertainty that a grid-cell is truly a hotpot or coldspot while lower entropy corresponds to certainty. Areas that have both large uncertainty in our seroincidence estimates and that are outside of the cholera surveillance zone are “greyspots,” locations about which no current cholera-related information is known.

[Edit entropy figure to highlight sero sites that are high in seroincidence and high in entropy]

Defining risk of infection with V. cholerae

Although we have great uncertainty about the relative seroincidence risk across Bangladesh, our previous modeling efforts represent the best information we have about cholera risk in a given location. We used these modeling outputs to characterize the disease risk of populations living in V. cholerae greyspots.

Using the modeled seroincidence risk of infection with V. cholerae, we calculate three quantitative measures at the 5 km x 5 km grid cell level to represent risk of infection:

  1. Relative risk: Median seroincidence risk at the grid cell level relative to the population-weighted mean seroincidence risk across Bangladesh. Relative risk of 1 indicates that the grid cell seroincidence risk is the same as the mean risk across the country.
  2. Proportion infected: Median estimated proportion of the grid cell population that was infected with V. cholerae in the last year.
  3. Absolute infections: Median estimated number of the grid cell population infected with V. cholerae in the last year.

For each of these measures, we identified thresholds by which we could partition grid cells into “high,” “moderate,” and “low” risk.

Relative risk

We examined the distribution of relative risk and the width of the relative risk confidence interval estimates to determine the range for identifying appropriate cutoffs.

## [1] "***** What is the distribution of relative risk estimates? *****"

## [1] "summary stats on median relative risk: deciles"
##        0%       10%       20%       30%       40%       50%       60%       70% 
## 0.2907897 0.6144528 0.6994404 0.7535107 0.7976431 0.8377182 0.8808166 0.9406474 
##       80%       90%      100% 
## 1.0436950 1.2220581 3.4658593
## [1] "What is the range of the relative risk estimates, where range is the width of the 95% CI?"

## [1] "summary stats on 1/2 range of RR: deciles"
##       0%      10%      20%      30%      40%      50%      60%      70% 
## 0.435140 1.085426 1.326346 1.476075 1.579364 1.661535 1.751240 1.844823 
##      80%      90%     100% 
## 1.954936 2.148030 3.391650
## [1] "Decision: Confidence interval ranges are too large to be useful. Arbitrarily choose relative risks of 0.8 and 1.2 as cutoffs"

We found that the confidence interval estimates were too large to be useful for this purpose. There was very high uncertainty in the relative seroincidence risk. Instead, we arbitrarily chose the relative risks of 0.8 and 1.2 as cutoffs for moderate and high risks.

Proportion infected

We examined the distribution of the median proportion of the population infected and chose the 30th and 70th percentiles as cutoffs for moderate and high risk.

## [1] "***** What is the distribution of median proportion (of the population) infected within the last year? *****"

## [1] "summary stats on median proportion infected: deciles"
##         0%        10%        20%        30%        40%        50%        60% 
## 0.04837716 0.10571532 0.11938902 0.12879537 0.13592896 0.14250386 0.15036977 
##        70%        80%        90%       100% 
## 0.16028699 0.17833605 0.20948973 0.59558562
## [1] "Decision: Use 30th and 70th percentiles as cutoffs"

Absolute infections

We examined the distribution of the median number of absolute infections on the log10 scale and chose the 30th and 70th percentiles as cutoffs for moderate and high risk.

## [1] "***** What is the distribution of median infections within the last year? *****"

## [1] "summary stats on median infections: deciles"
##              0%             10%             20%             30%             40% 
##      0.03424486    272.27551779    914.46268744   1906.48006982   2506.88484939 
##             50%             60%             70%             80%             90% 
##   3053.29933487   3574.13499636   4187.92561866   4979.59186066   6298.51464723 
##            100% 
## 230436.47893718
## [1] "Decision: Use 30th and 70th percentiles as cutoffs"

Results

Population in the cholera surveillance zone

We overlaid the cholera surveillance zone over population data in Bangladesh (original source: 2015 100m WorldPop estimates) to estimate the number of people living within the cholera surveillance zone (Table 2, Figure 3).

Table 2: Population living in the cholera surveillance zone.

Buffer (Subdistrict-District-Tertiary-icddr,b in km) Pop Proportion
10-20-30-30 50888584 0.3129697

Identifying greyspots outside the cholera surveillance zone

Areas both that have large uncertainty in our seroincidence estimates and that are outside of the cholera surveillance zone are “greyspots,” locations about which no cholera-related information is known (Figure 4). Here we show high uncertainty defined as having an entropy greater than 0.2, 0.3, and 0.4 to illustrate how the distribution of greyspots change as our understanding of uncertainty changes.

Figure 4: Cholera greyspots. The 5 km x 5 km grid cells that are colored are cells where we have reasonably confident seroincidence estimates (A. entropy < 0.2, B. entropy < 0.3, C. entropy < 0.4) or that are in the cholera surveillance zone. Grid cells in grey are greyspots, locations where we have almost no cholera information.

1. Relative seroincidence risk

Using data from a nationally-representative survey across Bangladesh, we developed maps of infection across 5 km x 5km grid cells in Bangladesh. This map estimates that from the 2015 serosurvey data collection there were 22488912.6521817 infections in Bangladesh over the past year out of an estimated population of 162599070.620399. We standardized the seroincidence estimates in each cell by the population-weighted mean of the Bangladesh infection risk in that cell. This yielded a relative risk of infection for each grid cell (Figure 5).

Figure 5: Median risk of V. cholerae seroincidence relative to a population-weighted mean by 5 km x 5 km grid cell. These relative risk estimates are bounded such that RRs above and below 2 and -2 were plotted as the values 2 and -2, repsectively. The black marks indicate sentinel hospital locations.

We describe the population living in each risk category (Table 3.)

Table 3: Population living in each risk category. The population living in high, moderate, and low risk areas according to relative seroincidence risk across Bangladesh.

Risk Level Population
High 13923615
Moderate 60146326
Low 88529130

We redrew the relative seroincidence risk map after binning grid cells into high, moderate, and low risk categories (Figure 6).

Figure 6: Cholera risk map as categorized by the risk of seroincidence relative to a population-weighted mean by 5 km x 5 km grid cell.

Relative risk in the cholera surveillance zone

We overlaid the cholera surveillance zone with the binned relative seroincidence risk maps to examine the distribution of risks in the surveilled areas (Figure 7 - Figure 8).

Figure 7: Relative seroincidence risk within cholera surveillance zones (10-20-30-30km for subdistrict, district, and tertiary care, and icddr,b hospitals).

Figure 8: Relative sero incidence risk outside of cholera surveillance zones (10-20-30-30km for subdistrict, district, tertiary care, and icddr,b hospitals).

Populations in the cholera surveillance zone

We examined the estimated number of infections, the percent infected in the cholera surveillance zone, and the percent of Bangladesh infections captured in the cholera surveillance zone (Table 4).

Table 4: Number and percent infections that may be captured in cholera surveillance zones. The percent infected represents the percentage of infected individuals captured within the cholera surveillance zone out of all infected individuals in Bangladesh.

Buffer size Surv. zone pop Number infected % infected in surv. zone pop % of all BGD infections
10-20-30-30 50888584 6393194 12.56 28.43

We then examined the distribution of relative-risk-based categories (Table 5).

Table 5: Number and percent infections that may be captured in cholera surveillance zones, categorized by relative risk. The infections in surveillance zone represents the percentage of infected people in high/moderate/low risk grid cells among all infections within the cholera surveillance zone. The population in surveillance zone represents the percentage of people living in high/moderate/low risk grid cells among all people within the cholera surveillance zone. The distribution should across risk categories should sum to 100% for each set of buffer sizes.

Buffer size Risk category Number infected % surv. zone infections Surv. zone pop % BGD pop in surv. zone
10-20-30-30 High 1180202 18.46 4903744 9.64
10-20-30-30 Moderate 2121914 33.19 12580679 24.72
10-20-30-30 Low 3091078 48.35 33404161 65.64

Populations by relative risk across Bangladesh

We sought to describe how well the cholera surveillance zones capture High, Moderate, Low populations across the population of Bangladesh.

We summarized the percentage of high, moderate, and low infection risk populations in Bangladesh that would be captured by cholera surveillance zones at different buffer sizes when risk was categorized by relative risk (Table 6).

Table 6: Number and percent infections that may be captured in Bangladesh, categorized by relative risk. The captured at-risk population represents the percentage of high/moderate/low risk populations captured by the cholera surveillance zone out of all high/moderate/low risk populations in Bangladesh. The captured infections represents the percentage of infections in high/moderate/low risk grid cells among all infections in high/moderate/low risk grid cells across Bangladesh.

Buffer size Risk category Surv. zone pop Surv. zone infections Captured At-Risk Pop (%) Captured Infections (%)
10-20-30-30 High 4903744 1180202 35.22 32.37
10-20-30-30 Moderate 12580679 2121914 20.92 22.41
10-20-30-30 Low 33404161 3091078 37.73 32.97

2. Estimated proportion of infections

The second measure of risk we examine to evaluate the surveillance system is the estimated proportion of the grid cell population that was infected with V. cholerae in the year prior to data collection. This yielded the median estimated proportion of infections for each grid cell (Figure 9).

Figure 9: Median estimated proportion of grid cell population infected with V. cholerae in the previous year. The black marks indicate sentinel hospital locations.

We describe the population living in each risk category defined by the 30th and 70th percentiles of the estimated proportion of infections in each grid cell (Table 7.)

Table 7: Population living in each risk category. The population living in high, moderate, and low risk areas according to the estimated proportion of infections across Bangladesh.

Risk Level Population
High 35633613
Moderate 52996224
Low 73969233

We redrew the estimated proportion of infections risk map after binning grid cells into high, moderate, and low risk categories (Figure 10).

Figure 10: Cholera risk map as categorized by the estimated proportion of V. cholerae infections by 5 km x 5 km grid cell.

Risk categories by proportion infected in the cholera surveillance zone

We overlaid the cholera surveillance zone with the binned infection proportion risk maps to examine the distribution of risks in the surveilled areas (Figure 11 - Figure 12).

Figure 11: Infection proportion risk categories within cholera surveillance zones (10-20-30-30km for subdistrict, district, and tertiary care, and icddr,b hospitals).

Figure 12: Infection proportion risk categories outside of cholera surveillance zones (10-20-30-30km for subdistrict, district, tertiary care, and icddr,b hospitals).

Populations in the cholera surveillance zone

We examined the estimated number of infections, the percent infected in the cholera surveillance zone, and the percent of Bangladesh infections captured in the cholera surveillance zone (Table 8).

Table 8: Number and percent infections that may be captured in cholera surveillance zones. The percent infected represents the percentage of infected individuals captured within the cholera surveillance zone out of all infected individuals in Bangladesh.

Buffer size Surv. zone pop Number infected % infected in surv. zone pop % of all BGD infections
10-20-30-30 50888584 6393194 12.56 28.43

We then examined the distribution of categories (Table 9) based on the proportion of infections.

Table 9: Number and percent infections that may be captured in cholera surveillance zones, categorized by the proportion infected. The infections in surveillance zone represents the percentage of infected people in high/moderate/low risk grid cells among all infections within the cholera surveillance zone. The population in surveillance zone represents the percentage of people living in high/moderate/low risk grid cells among all people within the cholera surveillance zone. The distribution should across risk categories should sum to 100% for each set of buffer sizes.

Buffer size Risk category Number infected % surv. zone infections Surv. zone pop % BGD pop in surv. zone
10-20-30-30 High 2676588 41.87 13179025 25.90
10-20-30-30 Moderate 872659 13.65 6169028 12.12
10-20-30-30 Low 2843947 44.48 31540531 61.98

Populations by proportion infected across Bangladesh

We sought to describe how well the cholera surveillance zones capture High, Moderate, Low populations across Bangladesh.

We summarized the percentage of high, moderate, and low infection risk populations in Bangladesh that would be captured by cholera surveillance zones at different buffer sizes when risk was categorized by the infection proportion (Table 10).

Table 10: Number and percent infections that may be captured in Bangladesh, categorized by the proportion of individuals infected with V. cholerae . The captured at-risk population represents the percentage of high/moderate/low risk populations captured by the cholera surveillance zone out of all high/moderate/low risk populations in Bangladesh. The captured infections represents the percentage of infections in high/moderate/low risk grid cells among all infections in high/moderate/low risk grid cells across Bangladesh.

Buffer size Risk category Surv. zone pop Surv. zone infections Captured At-Risk Pop (%) Captured Infections (%)
10-20-30-30 High 13179025 2676587.7 36.98 35.60
10-20-30-30 Moderate 6169028 872658.9 11.64 11.59
10-20-30-30 Low 31540531 2843947.1 42.64 38.22

3. Number of V.cholerae infections

The third measure of risk we examine to evaluate the surveillance system is the median estimated number of V. cholerae infections in each grid cell. (Figure 13).

Figure 13: Median number of estimated V. cholerae infections per grid cell in the previous year. The black marks indicate sentinel hospital locations.

We describe the population living in each risk category defined by the 30th and 70th percentiles of the number of infections in each grid cell (Table 11.)

Table 7: Population living in each risk category. The population living in high, moderate, and low risk areas according to the estimated proportion of infections across Bangladesh.

Risk Level Population
High 96673622
Moderate 56529584
Low 9395865

We redrew the estimated number of infections risk map after binning grid cells into high, moderate, and low risk categories (Figure 14).

Figure 14: Cholera risk map as categorized by the estimated number of V. cholerae infections by 5 km x 5 km grid cell.

Risk categories by number of infections in the cholera surveillance zone

We overlaid the cholera surveillance zone with the binned number of infections risk map to examine the distribution of risk in the surveilled areas (Figure 15 - Figure 16).

Figure 15: Number of infections risk categories within cholera surveillance zones (10-20-30-30km for subdistrict, district, and tertiary care, and icddr,b hospitals).

Figure 16: Number of infections risk categories outside of cholera surveillance zones (10-20-30-30km for subdistrict, district, tertiary care, and icddr,b hospitals).

Populations in the cholera surveillance zone

We examined the estimated number of infections, the percent infected in the cholera surveillance zone, and the percent of Bangladesh infections captured in the cholera surveillance zone (Table 12).

Table 12: Number and percent infections that may be captured in cholera surveillance zones. The percent infected represents the percentage of infected individuals captured within the cholera surveillance zone out of all infected individuals in Bangladesh.

Buffer size Surv. zone pop Number infected % infected in surv. zone pop % of all BGD infections
10-20-30-30 50888584 6393194 12.56 28.43

We then examined the distribution of risk categories (Table 13) based on the number of infections.

Table 13: Number and percent infections that may be captured in cholera surveillance zones, categorized by the number of infections. The infections in surveillance zone represents the percentage of infected people in high/moderate/low risk grid cells among all infections within the cholera surveillance zone. The population in surveillance zone represents the percentage of people living in high/moderate/low risk grid cells among all people within the cholera surveillance zone. The distribution should across risk categories should sum to 100% for each set of buffer sizes.

Buffer size Risk category Number infected % surv. zone infections Surv. zone pop % BGD pop in surv. zone
10-20-30-30 High 4813655 75.29 38488020 75.63
10-20-30-30 Moderate 1431566 22.39 11344433 22.29
10-20-30-30 Low 147973 2.31 1056131 2.08

Populations by number of infections across Bangladesh

We sought to describe how well the cholera surveillance zones capture High, Moderate, Low populations across Bangladesh.

We summarized the percentage of high, moderate, and low infection risk populations in Bangladesh that would be captured by cholera surveillance zones at different buffer sizes when risk was categorized by the number of V. cholerae infections (Table 14).

Table 14: Number and percent infections that may be captured in Bangladesh, categorized by the number of individuals infected with V. cholerae . The captured at-risk population represents the percentage of high/moderate/low risk populations captured by the cholera surveillance zone out of all high/moderate/low risk populations in Bangladesh. The captured infections represents the percentage of infections in high/moderate/low risk grid cells among all infections in high/moderate/low risk grid cells across Bangladesh.

Buffer size Risk category Surv. zone pop Surv. zone infections Captured At-Risk Pop (%) Captured Infections (%)
10-20-30-30 High 38488020 4813654.9 39.81 35.20
10-20-30-30 Moderate 11344433 1431566.3 20.07 19.07
10-20-30-30 Low 1056131 147972.5 11.24 11.32

Discussion